class: center, middle, inverse, title-slide # Lecture 2 ## New variales and Plots ### Psych 10 C ### University of California, Irvine ### 03/30/2022 --- ## Load data into R - We will keep working with the memory data from last class
--- ## Creating new variables - Sometimes we will like to work with some transformation of the variables that we have on a data file. -- - For example, if we're interested in knowing how many of the words that were correctly recalled came from the first or second test, we could use the current variables to add a new variable that contains this information -- This can make some plots easier to make! --- ## Creating a new variable - We will create a new variable that generates a label "test-1" whenever the test time (time_test) was 300 seconds after the study phase, and "test-2" when the test time was 3600 seconds after: ```r # create new variable using pipes! %>% this takes the output # of the presceding line and uses it as input on the next one memory <- memory %>% mutate("test_id" = ifelse(test = time_test == 300, yes = "test_1", no = "test_2")) # look at the first 4 rows of the data head(x = memory, n = 4) ``` ``` # A tibble: 4 × 5 id age correct time_test test_id <dbl> <dbl> <dbl> <dbl> <chr> 1 1 20 46 300 test_1 2 2 29 49 300 test_1 3 3 29 48 300 test_1 4 4 25 44 300 test_1 ``` --- ## Creating a new variable - The **`mutate()`** function allows us to create new variables from the ones already available in the dataset. **`ifelse()`** is another function applied here to impose a condition to be considered when creating this new variable. -- - We will use our new variable to create plots our data. --- class: inverse, center, middle # Plotting ## Histograms --- ## Histograms - A histogram represents a count of the number of times that a value has appeared in our data. - They are constructed by creating intervals and counting the number of data points that fall on each. --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r *ggplot(data = memory) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + * aes(x = correct) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + * aes(fill = test_id, color = test_id) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + * geom_histogram(position="identity", * binwidth = 1, * alpha = 0.4) ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + * theme_classic() ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + * xlab("Number of correct recalls") ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + xlab("Number of correct recalls") + * ylab("Frequency") ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + xlab("Number of correct recalls") + ylab("Frequency") + * guides(fill = guide_legend("Test order"), color = "none") ``` ] .panel2-hist-code-auto[ <!-- --> ] --- count: false ### Histogram of correct recalls .panel1-hist-code-auto[ ```r ggplot(data = memory) + aes(x = correct) + aes(fill = test_id, color = test_id) + geom_histogram(position="identity", binwidth = 1, alpha = 0.4) + theme_classic() + xlab("Number of correct recalls") + ylab("Frequency") + guides(fill = guide_legend("Test order"), color = "none") + * theme(axis.title.x = element_text(size = 20), * axis.title.y = element_text(size = 20)) ``` ] .panel2-hist-code-auto[ <!-- --> ] <style> .panel1-hist-code-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-hist-code-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-hist-code-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## Histograms - One of the main problems with histograms is that their shape depends on our choice of the width of the bars! -- - A change on the shape can change our interpretation of the results so we need to be careful when making our choice. -- - Histograms can be used for numeric variables. --- class: inverse, center, middle # Plotting ## Box-plots --- ## Box-plots .pull-left[ 1. Box: has 3 marks, the limits which represent the first and third quantile and the median or second quantile. 1. Whiskers: represent the maximum (minimum) of our observations that are lower (greater) than 1.5 times the distance between the first and third quantile. 1. Everything outside of those marks is considered as an outlier. ] .pull-right[ <img src="data:image/png;base64,#lec-2_files/figure-html/ex-bp-1.png" style="display: block; margin: auto;" /> ] -- - We can use the same data as before for an example --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r *ggplot(data = memory) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + * aes(y = correct) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + * aes(x = test_id) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + * aes(color = test_id) ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + * scale_color_brewer(palette="Dark2") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + * geom_boxplot(fill = "white") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + * xlab("Test order") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + xlab("Test order") + * ylab("Number of correct recalls") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + xlab("Test order") + ylab("Number of correct recalls") + * guides(fill = "none", color = "none") ``` ] .panel2-bp-code-auto[ <!-- --> ] --- count: false ### Box plot correct responses .panel1-bp-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = test_id) + aes(color = test_id) + scale_color_brewer(palette="Dark2") + geom_boxplot(fill = "white") + xlab("Test order") + ylab("Number of correct recalls") + guides(fill = "none", color = "none") + * theme(axis.title.x = element_text(size = 20), * axis.title.y = element_text(size = 20)) ``` ] .panel2-bp-code-auto[ <!-- --> ] <style> .panel1-bp-code-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-bp-code-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-bp-code-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## Box plots - Box plots show us how our data is dispersed, for example the number of correctly recalled words are closer together in the first test in comparison to the second. -- - We can also see that the median number of correctly recalled words was higher on test one. -- - This plot also allows us to evaluate if our data is dispersed symmetrically around the median value, or if there's some bias towards one of the ends. -- - There are some variables that we would not expect to be symmetric, think about reaction times in a game. --- class: inverse, center, middle # Plotting ## Scatter plots --- ## Scatter plots - Histograms are useful when we have a single numeric variable. -- - Box plots are very informative of the variability in our data. -- - Scatter plots are useful when we want to see how two numerical variables "change" together. --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r *ggplot(data = memory) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + * aes(y = correct) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + * aes(x = age) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + * aes(color = test_id) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + * geom_point(fill = "white") ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + * xlab("Age") ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + * ylab("Number of correct recalls") ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + ylab("Number of correct recalls") + * guides(fill = "none", color = guide_legend("Test order")) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + ylab("Number of correct recalls") + guides(fill = "none", color = guide_legend("Test order")) + * theme(axis.title.x = element_text(size = 20), * axis.title.y = element_text(size = 20)) ``` ] .panel2-scatter-code-auto[ <!-- --> ] --- count: false ### Scater plot correct responses vs age .panel1-scatter-code-auto[ ```r ggplot(data = memory) + aes(y = correct) + aes(x = age) + aes(color = test_id) + geom_point(fill = "white") + xlab("Age") + ylab("Number of correct recalls") + guides(fill = "none", color = guide_legend("Test order")) + theme(axis.title.x = element_text(size = 20), axis.title.y = element_text(size = 20)) + * geom_smooth(method = lm) ``` ] .panel2-scatter-code-auto[ <!-- --> ] <style> .panel1-scatter-code-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-scatter-code-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-scatter-code-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style>